PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification

نویسندگان

  • Ellie Pavlick
  • Pushpendre Rastogi
  • Juri Ganitkevitch
  • Benjamin Van Durme
  • Chris Callison-Burch
چکیده

We present a new release of the Paraphrase Database. PPDB 2.0 includes a discriminatively re-ranked set of paraphrases that achieve a higher correlation with human judgments than PPDB 1.0’s heuristic rankings. Each paraphrase pair in the database now also includes finegrained entailment relations, word embedding similarities, and style annotations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of Entailment Relations in PPDB

This document outlines our protocol for labeling noun pairs according to the entailment relations proposed by Bill MacCartney in his 2009 thesis on Natural Language Inference. Our purpose of doing this is to build a labelled data set with which to train a classifier for differentiating between these relations. The classifier can be used to assign probabilities of each relation to the paraphrase...

متن کامل

From Paraphrase Database to Compositional Paraphrase Model and Back

The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive semantic resource, consisting of a list of phrase pairs with (heuristic) confidence estimates. However, it is still unclear how it can best be used, due to the heuristic nature of the confidences and its necessarily incomplete coverage. We propose models to leverage the phrase pairs from the PPDB to build parametric parap...

متن کامل

Adding Semantics to Data-Driven Paraphrasing

We add an interpretable semantics to the paraphrase database (PPDB). To date, the relationship between the phrase pairs in the database has been weakly defined as approximately equivalent. We show that in fact these pairs represent a variety of relations, including directed entailment (little girl/girl) and exclusion (nobody/someone). We automatically assign semantic entailment relations to ent...

متن کامل

Vector-space models for PPDB paraphrase ranking in context

The PPDB is an automatically built database which contains millions of paraphrases in different languages. Paraphrases in this resource are associated with features that serve to their ranking and reflect paraphrase quality. This context-unaware ranking captures the semantic similarity of paraphrases but cannot serve to estimate their adequacy in specific contexts. We propose to use vector-spac...

متن کامل

Second-Order Word Embeddings from Nearest Neighbor Topological Features

We introduce second-order vector representations of words, induced from nearest neighborhood topological features in pre-trained contextual word embeddings. We then analyze the effects of using second-order embeddings as input features in two deep natural language processing models, for named entity recognition and recognizing textual entailment, as well as a linear model for paraphrase recogni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015